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(54) Active neural network control of wafer attributes in a plasma etch process. 

(57) The present invention is predicated upon the fact that an emission trace from a plasma glow used in 
fabricating integrated circuits contains information about phenoma which cause variations in the 
fabrication process such as age of the plasma reactor, densities of the wafers exposed to the plasma, 
chemistry of the plasma, and concentration of the remaining material. In accordance with the present 
invention, a method for using neural networks to determine plasma etch end-point times in an 
integrated circuit fabrication process is disclosed. The end-point time is based on in-situ monitoring of 
the optical emission trace. The back-propagation method is used to train the network. More generally, a 
neural network can be used to regulate control variables and materials in a manufacturing process to 
yield an output product with desired quality attributes. An identified process signature which reflects 
the relation between the quality attribute and the process may be used to train the neural network. 
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Technical Field 

This invention relates to a method for use of neural networks to regulate control variables and materials 
in manufacturing processes such as those employing plasmas. The invention is illustrated by preferred em- 
5 bodiment in which a neural network, responsive to a plasma glow process, is used to control etch time in plas- 
ma etching. 

Background of the Invention 

w Plasma processes are important in the aerospace, solar energy, paper, textile industries as well as in the 

electronics industry for the fabrication of integrated circuits and optoelectronic devices. See National Re- 
search Council, Plasma Processing of Materials, National Academy Press, Washington, D.C., 1991. For ex- 
ample, plasmas are used both to etch and to deposit thin film layers on integrated circuit substrates. 

A plasma is an ionized gas in which concentrations of positive and negative ions are almost equal. The 

15 plasma may also contain free radicals which are electrically neutral yet highly reactive. A plasma is formed by 
introducing a desired gas into a reactor or chamber and applying a radio frequency (RF) field to the chamber. 
The gas introduced is typically chosen to participate in the chemistry of the desired process, as for example 
chlorine gas in etching polysilicon in the fabrication of integrated circuits. The RF field causes electron colli- 
sions with neutral or charged species which then emit radiation producing a glow-discharge or emission. 

20 Plasma etching is the selective removal of material by the reactive free radicals or ions generated within 

the plasma. In many cases, the plasma etching process is superior to wet etching techniques (where material 
is etched by liquid chemicals) in terms of exactness of the etching and process control. See generally, R. G. 
Poulsen, "Plasma Etching in Integrated Circuit Manufacture - A Review," J. Vac. ScL Tech., Vol. 14, No. 1 ,266- 
274 (Jan./Feb. 1977). 

25 Plasma processes are generally difficult to control. See, e.g., National Research Council at 34-35. For 

example, the plasma etching process must be continuously monitored to compensate for variations. One cause 
of variation in the process is the aging of the reactor. The etch time for a freshly cleaned reactor chamber is 
different than the etch time for a reactor that has been in production use for a time. Also, wafers having dif- 
ferent pattern densities etch differently. Such changes necessitate continual inspection to maintain the quality 

30 of the product. Based on the inspection results, a decision is made for the etch time for the next lot. However, 
the requirement for continuous human intervention to account for the effects of machine aging and cleaning 
leads to run-to-run variations in wafer attributes or characteristics between lots. Thus, there is a need for an 
accurate control mechanism to adjust the etch times between lots without continuous human intervention. 

35 Summary of the Invention 

The present invention uses neural networks to govern or regulate input control variables and materials 
used in manufacturing processes to yield an output product with desired quality attributes. The method is par- 
ticularly useful in controlling plasma processes, and it avoids many of the costs, delays and inconsistencies 
40 associated with prior methods. In the preferred embodiment, a neural network controller monitors a portion 
of the optical emission trace during the plasma etching process and computes the plasma etch end-point time 
based on this observation. The network is trained directly with production data measurements of product qual- 
ity using the back-propagation technique. An automated etch time control process offers advantages in terms 
of greater uniformity, higher yields and lower costs. 

45 

Brief Description of the Drawings 

Other features and advantages of the invention will become apparent from the following detailed descrip- 
tion taken together with the drawings in which: 
so Figure 1 illustrates a plasma etching step in fabricating an integrated circuit. 

Figure 2 illustrates a neural network process monitor for regulating a manufacturing process. 

Figure 3 illustrates a plot of a typical emission trace. 

Figure 4 illustrates a block diagram system for training a neural network. 

Figure 5 illustrates the average resulting gate thickness as a function of the number of learning trials for 
55 the neural network. 

Figure 6 illustrates the performance of the neural network as a function of data base size. 
Figure 7 illustrates a neural network controller which monitors input control variables and materials to reg- 
ulate a manufacturing process. 
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Figure 8 illustrates a schematic of a feedforward neural network. 

Figure 9 illustrates the adaptive procedure for adjusting the weight matrix elements. 

Figure 10 illustrates a generic layer in a feed forward network. 

Figure 11 illustrates a neural network process monitor and database for regulating a manufacturing proc- 

5 ess. 

Figure 12 illustrates a block diagram system for training a neural network using database information. 
Detailed Description 

10 I. Introduction 

Figure 1 illustrates typical use of the plasma etching process as one step in fabricating a MOS transistor. 
Silicon wafer substrate 10 is covered by oxide layer 12. Oxide layer 12 is then covered with polysilicon layer 
14 and tantalum silicon layer 16. Oxide layer 12, typically silicon diode, has a well 13 in it. Alayer of photoresist 
15 material 18 is applied to a portion of well 13, and tantalum silicon layer 16 and polysilicon layer 14 are etched 
away. 

In etching away polysilicon layer 14, it is important that all of polysilicon layer 14 be removed. However, 
in etching polysilicon layer 14 completely a portion 21 of oxide layer 12 will inevitably be etched. In the etching 
process, a critical quality attribute is remaining oxide thickness 20 in the source region 22 and drain region 

20 24 which determines the characteristics of these regions. Remaining oxide thickness 20 is a function of the 
etch time, i.e. the period for which the wafer is exposed to the plasma. 

Figure 2 presents an illustrative embodiment of the present invention in which a neural network is advan- 
tageously incorporated into process monitor 202 which regulates or governs the materials and control variables 
input into process 204 to produce a desired quality attribute in the output product. Section II presents an il- 

25 lustrative embodiment of the invention in which the plasma etch time in the fabrication of integrated circuits 
is controlled by a neural network process monitor which uses a portion of the trace of the optical emission to 
produce a desired oxide thickness on a wafer. An overview of neural network operation and the preferred back- 
propagation training technique are discussed in Section III. 

30 II. A Neural Network Controller 

A trace or record may generally be defined as a measurement over time of a specific variable or function. 
In the preferred embodiment a portion of the trace of the optical emission spectrum from the plasma glow, 
measured at a specified wavelength as a function of time, is used as a process signature. A process signature 

35 reflects, or has embedded in it, information related to quality attributes and to the process itself as well as 
information about factors which make the process difficult to control. The optical emission trace reflects in- 
formation directly related to the chemistry of the plasma and information regarding the concentration of the 
material etched away. Indirectly, its behavior contains information about aging of the machine, pattern density 
of the wafers, non-ideal fluctuations in gas flow, pressure, RF power, etc. This information embedded in the 

40 optical trace is adequate for neural network mapping or training to predict and control the ideal etch time for 
a desired oxide thickness. See W. T. Miller, R. S. Sutton and P. J. Werbos, Neural Networks for Control, MIT 
Press, Cambridge, MA (1990) for a collection of papers on neural networks for control. 

Other process signatures may also be identified and used to train a neural network and to control a proc- 
ess. In some cases, for example, traces of the input control variables and materials, such as power, temper- 

45 ature, pressure, etc., may be process signatures. In these cases, a set point, i.e. a desired or fixed value, is 
generally selected for each of the inputs. The inputs will fluctuate around these set points. The variations may 
reflect the progress of the process in producing an output with specified quality attributes, and thus these 
traces may be process signatures which can be used to train the neural network and to control the process. 
Figure 3 is a plot of a typical emission trace which in the preferred embodiment indicates the amount of 

50 chlorine gas in the plasma. The chlorine gas emission was measured at a wavelength of 837 nanometers. The 
time units are based on the frequency of data collection which, in this case, is approximately 1 .9 sec. per unit. 
This trace is for a two-step gate etch in which the first etch (from t=25 to t=115) is a TaSi etch and the second 
is for a polysilicon etch. During the polysilicon etch step some of the underlying oxide in the exposed source 
and drain regions wi II inevitably be etched. The amount etched is dependent on the selectivity of the polysilicon 

55 etch conditions toward the oxide, and is a parameter of-critical concern in the device fabrication. At time unit 
t=52 the instrument is auto-zeroed, which normalizes the trace data. The auto-zeroing or normalizing step is 
important because the trace has information not only about emission intensity but also about the last cleaning 
of the chamber. Thus, the auto-zero step adjusts the signal to allow for fogging of the optical windows as the 
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machine ages. After the TaSi etch, the wafer is transferred to a second chamber for the polysilicon etch under 
a different chemistry. The spike at t » 115 occurs at the turn on of the RF power for the polysilicon etch. One 
current method of etch end-point detection is to observe the time at which the trace crosses a threshold, and 
then to etch for an additional time which is a predetermined fraction of the initial etch time. In the example 

5 shown in Figure 3, there is a threshold point at about 82 time units. This represents the end-point of the TaSi 
etch and the time from 82 to 115 is the TaSi over-etch time. The threshold point at about 145 signals the end- 
point of the polysilicon etch and the time from 145 to 170 is the over-etch time for the polysilicon etch. 

Since the primary quality parameter of the process is the thickness of the oxide remaining in the source 
and drain regions after the etch, trace data for the polysilicon etch will be used to compute the ideal etch time 

10 for that process. The neural network controller illustratively uses as input twenty-four measurements of the 
polysilicon etch's emission trace. The first seven measurements typically are skipped to avoid the spike at 
turn-on. The next twenty-four data points typically constitute the first one-third of the total etching time. This 
ensures that the input data can be collected well before the etch is completed. This input is used to train the 
neural network to compute the optimum total etch time to obtain a target oxide thickness in the source and 

15 drain regions. 

Figure 4 is an illustrative diagram of the method of training the neural network. In the preferred embodi- 
ment, the neural network 402 is a 24x5x1 architecture (i.e. 24 input nodes, 5 hidden nodes and 1 output node) 
and trained with the back-propagation technique (see Section III below). Results from a production database 
may be used for the training. Since such results will show a spread in observed oxide thicknesses, it becomes 

20 necessary to compute for each run in the database the time that would have been required to achieve the 
optimal result. The oxide thickness and etch time from each run can be corrected, using the known rate of 
oxide etching in the polysilicon etch chamber, to compute a first-order correction on the etch time. This cor- 
rected etch time is then used as the target for training the neural network. Thus, the network is trained directly 
with measurements of the attribute which determine product quality. 

25 The first order linear correction process 405 computes the ideal etch time. If the oxide thickness T ob3 and 

the etch time t^ ta from some real world observations are known, then given the etch rate E R and the desired 
thickness T des we can compute an ideal etch time for this desired thickness: 

. _ . , (Tobs ~ T"des) 
Mdeal ~ ^bs + c 

R 

30 The difference between the ideal etch time and the neural network's guessed or estimated etch time is fed 
back to the neural network as an error signal from which the neural network can compute new values in the 
weight matrices. In all of the examples in the database, the observed etch time is close to the ideal one, and 
the first-order corrections are small. It is important to keep in mind, however, that this kind of linear correction 
is only valid for making these small corrections near the end of the etch process. The earlier steps in the etch 
35 sequence, which encounter different layers and transient effects at turn-on time, are highly nonlinear and less 
well-behaved. These steps will be accounted for in an actual experimental database. 

In training a neural network, as in any statistical learning tool, the data set may be partitioned into two 
sections. One section is used for training and the other section is used for testing. The preferred embodiment 
started with 650 examples, from which 50 were separated for testing, and 600 were used to train the network. 
Of these 650 examples the pattern density of the product being etched varied by as much as 50%. This pattern 
density information is, of course, also embedded in the trace file so it was not explicitly considered. It is as- 
sumed in such an approach that there is enough information in the trace file to accurately predict the correct 
etch time and that pattern density need not be explicitly included in the network's input. The training consisted 
of repeated random selection of examples from the training set and back-propagating the resulting errors to 
45 update the connection matrices. 

In the preferred embodiment the training was stopped after every 1000 trials and the performance of 
the network was tested with the test set. These results are shown in Figure 5. The oxide thickness before etch 
was 208A. Since the network before training has very small weights, its initial output was near zero etch time, 
so the resulting oxides in the test data would have been unetched . After repeated training the network guesses 
50 improved, until after 50,000 training cycles the computed etch times were resulting in an average oxide thick- 
ness of 145A, compared with a target value equal to the value for human assisted methods of 151 A. More 
importantly, the current, human assisted method results in a standard deviation of 15.1A in the oxide thick- 
nesses. The neural network showed an equivalent spread of 15.0A. 

There are various methods to improve the neural network learning performance. See J. Denker et ai., 
55 "Large Automatic Learning, Rule Extraction, and Generalization," Complex Systems, Vol. 1, 877-922 (1987). 
The most common is to adjust the complexity of the network. But there is a trade-off in the complexity and 
the error. A network that is too complex will essentially build a lookup table with the training data and perform 
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poorly on testing data. The network in the illustrative embodiment has been optimized for a minimum com- 
plexity and while still achieving a low error. It should also be apparent that as the amount of training data in- 
creases the testing error should decrease. Figure 6 is a plot which demonstrates this. The figure shows that 
as the training set size increases, the standard deviation of the network error decreases. Thus, a tighter proc- 

5 ess control over the oxide thickness remaining after the plasma etch may be expected. 

Figure 7 illustrates another embodiment of the invention. In this embodiment, the neural network is again 
trained with production data measurements of quality attributes. In this embodiment, the process signatures 
monitored by the neural network may be portions of the traces of a plurality of input control variables and ma- 
terials. For example, RF power, gas flow and pressure could be advantageously used to compute the total etch 

10 time. As with the preferred embodi ment, it is only required that the neural network monitor some physical char- 
acteristics or manifestations of the complete manufacturing process which reflect or have embedded in them 
information sufficient to assure process quality as measured by attributes of the output product. 

Figure 11 illustrates a further embodiment of the invention in which neural network process monitor 202 
advantageously uses input information from database 1106 as well as input from the process signature to reg- 

15 ulate the material and control variables input to process 204 to produce an output product with the desired 
quality attributes. Database 1106 may be any information storage system, but preferably any such system 
should be capable of dynamically accessing information in real time in response to interrogations from neural 
network process monitor 202. The additional information in database 1106 can result in improved network 
training and control of process variables. 

20 In the context of the plasma etching process described above, database 1106 may contain information 

regarding process signatures (other than optical emission trace) and parameters (e.g. number of chips per 
wafer; RF power, DC bias and etch times from previous lots; and statistical data relating to observed oxide 
thickness in previous lots). More generally, the type of information in database 1106 will vary depending on 
the type and variety of input materials and control variables as well as the process signatures and process 

25 variables used for a given process. 

As noted above, the information in database 1106 may be used to train the neural network as show. The 
training process is similar to that as described for and shown in Figure 4. Figure 12 illustrates that other process 
signatures as well as observations and statistics of previous and current lots may be used to train the neural 
network. 

30 Apart from the practical aspects of making better etch end-point detectors for plasma etching, this tech- 

nique demonstrates the ability of a neural network to extract useful information from complex data. The phys- 
ics needed to monitor the progress of a plasma etching procedure based on the optical emission trace alone 
would probably be prohibitively difficult. However, the neural network simply learns by trial and error to extract 
the appropriate information. Those skilled in the art will recognize that this technique can be applied to many 

35 similar control problems provided the data in the process monitor has embedded within it the necessary in- 
formation. 

///. Neural Networks 

40 Neural networks are loosely modeled after biological neural networks. The underlying physical models 

for these networks have developed from early attempts to model the behavior of their biological counterparts. 
These networks generally consist of layers of active elements, neurons, coupled together by variable strength 
connection matrices simulating the synapses found in biological networks. Layered networks of interconnect- 
ed, nonlinear neurons are finding increasing use in problems such as pattern recognition, signal processing, 

45 systems identification, speech processing and robotics and machine control. Their most powerful advantage 
is their ability to adaptively learn to emulate the behavior of other nonlinear systems. This adaptive learning 
process is based on a trial and error approach in which the network is gradually modified until it exhibits the 
desired characteristics. For a historical review of neural network techniques, see B. Widrow & M. A. Lehr, "30 
Years of Adaptive Neural Networks: Perception, Madaline, and Backpropagation," Proc. IEEE, Vol. 78, No. 9, 

50 1415-1442 (Sept 1990). 

Figure 8 is a schematic of a typical feedforward neural network in which neurons 805-811 (also called 
nodes) are indicated by rectangles and synaptic connections 815-816 are indicated by the lines connecting 
each node. Adistinguishing characteristic of these networks is the absence of feedback. Signals flow in parallel 
paths through them, connecting only to nodes further downstream. In analog electronic hardware versions 

55 of these networks, this highly parallel architecture allows them to perform at a high equivalent computational 
rate. Software networks, based on this physical model, do not realize the same benefits of parallelism, but 
can still exploit the ability of these networks to adaptively learn. The algorithms that have been developed to 
design neural networks that do particular computational tasks apply equally well to both hardware and soft- 
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ware networks. 

Layers of neurons that are not directly connected to the output are often called hidden layers. Only one 
hidden layer, nodes 805-809. is shown in Figure 8. Each neuron acts as a summing node, collecting a weighted 
sum of all the signals in the previous layer. In addition, each has associated with it response curve 820 called 
its activation function. The hidden neurons are usually chosen to have a sigmoidal, or step shaped, activation 
function. The hyperbolic tangent, arc tangent and Ferrni functions are examples of mathematical functions 
that are commonly used in software networks, all of which have the correct shape and give comparable results. 
Their sigmoidal shape is important The nonlinearity can be used by the adaptive network to construct non- 
linear output responses. However, their saturating nature keeps the response of the overall network nicely 
bounded, preventing the uncontrolled behavior that can result from other nonlinear methods like polynomial 
curve fitting. The hyperbolic tangent function is illustratively used in the preferred embodiment. The activation 
function for output neurons 81 0-81 1 in the last layer, on the other hand, is usually linear. This gives an unlimited 
dynamic range, allowing the network to generate arbitrarily large output signals. 

The interconnections between the layers of neurons are variable connection strength coefficients in a 
weight matrix. In a software network, the physical architecture shown in Figure 8 is described by the equation 



Ok = Z W2 jk tanh 



Wlij 



(1) 



This relation describes o k . the linear output of the kth neuron in the output layer, for a set of inputs, i,. Equations 
for networks with more hidden layers would contain an additional summation and nonlinear activation function 
for each additional layer. The matrix W1 is the input-to-hidden layer matrix, and the W2 matrix is the hidden- 
to-output matrix. 

Notice that apartf rom the number of layers and the number of neurons in each layer, the basic architecture 
and elements within the network are generic. One network of this kind looks more or less like any other. The 
function of the network is not determined by its active elements, the neurons, but rather by the interconnec- 
tions among them. The basic idea, shown schematically in Figure 9, is to adjust the' values of the individual 
connections Wig and W2 ik to minimize the root mean square (rms) error in the output This process is done 
gradually - i.e. the changes to the connections after each individual trial are small. With each change, however, 
the network more closely approximates the desired response. 

A widely used adaptive method to compute the changes in the connection matrices is the back- 
propagation of errors technique. D. E. Rumelhart, G. E. Hinton & R. J. Williams, "Learning Internal Represen- 
tations by Error Propagation," in D. E. Rumelhart, J. L. '$ & the PDP Research Group, Parallel Distributed Proc- 
essing, Vol. 1 , 318-330, MIT Press, Cambridge, MA (1986). See also, E. A. Rietman and R. C. Frye, "The Back 
Propagation Algorithm for Neural Networks," Algorithm, Vol. 1.4, 17, May/June 1990. The training of a neural 
network by the back-propagation technique consists of adjusting the elements of several matrices in order to 
minimize the error between the network output and the target response. This method is a generalization of 
the delta rule, see Rumelhart, supra, at 321, and is based on a gradient descent optimization technique. It 
attempts to mini mize the mean-squared error in the output of the network, as compared to a desired response. 
For a network having multiple outputs, the rms error is given by 



E = 



k 1 J 



1/2 



(2) 



where t k and o k are the target and the output values for the k«» component of the vectors. If the network is 
time invariant, then its output will depend only on its inputs, i, and the current value of the connection weight 
matrices, W1„ and W2 jk , as in Equation 1. For a particular input vector the error is determined by the values 
of the weighting coefficients that connect the network. The method used in the adaptive procedure is to 
change these connections by an amount proportional to the gradient of the error in weight space, i.e. for a 
given weight coefficient wy, 

This technique results in lowering the average error as the weight matrices in the network evolve. In 
layered networks of the kind considered, the changes in the weights after each trial are proportional to the 
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error itself. This leads to a system that settles to a stable weight configuration as the errors become small. 
However, the changes become zero only for zero gradient of the error in weight space. This zero can represent 
either a true global minimum or only a local one, but from a practical standpoint this gradient descent algorithm 
generally results in useful solutions. 
5 The generalized delta rule is a simple computational algorithm derived from Equation 5. Figure 10 shows 

a generic layer within a network. In this representation, the output of the layer is a vector of signals Oj. Its input 
vector, o |f may itself be an output from a preceding layer. Output vector Oj may, in turn, provide input to the 
next layer. Neurons 1001-1004 have an activation function fj and are coupled to the input vector by a weight 
matrix w^ The net input to each neuron is given by 

10 

netj = J Oi (4) 



15 and the output vector is given by 

Qj = f,(netj). (5) 
For the layer in the figure, the weight changes are 

AWy = T|OiSj. (6) 

In this relationship, r| is the learning rate, and is the constant of proportionality implicit in Equation 3. If the 
20 layer in question is an output layer, then 5j is given by 

Sj = (tj - o^netj), (7) 

where tj is the target, or desired, output vector and fj denotes the differential of the neuron's activation function 
with respect to the input signal. For linear output neurons, f ( is just a constant (usually unity). However, if the 
layer is hidden inside the network, it is not immediately apparent what its target response should be. In this 
case, 5j is computed iteratively using 

5j = fj(nctj) I5 k w jk (8) . 

k 

30 

where 8 k and w jk refer to the layer immediately after the one in question (i.e. to the right of the layer in Figure 
7). So, for example, in the network described by Equation 1, 5j for the hidden layer is computed using S k from 
the output layer, the matrix W2 jk and fj = sech 2 . In practice, the input is first propagated forward through the 

35 network. The weight changes are first computed in the last layer, using Equations 7 and 8, and then working 
backwards through the network layer by layer using Equations 6 and 8. 

Thus, neural networks are, in essence, a nonparametric nonlinear learning algorithm. No assumptions are 
made about the populations of the variables and no assumptions are made about the functional relations of 
the variables. The only assumption is that there is a cause and effect relation between the inputs and the 

M outputs which can be learned by the neural network. S. W. Weiss and C. A. Kulokowski, Computer Systems 
That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert 
Systems, Morgan Kaufman, San Mateo, CA (1991); K. Hornik, M. Stinchcombe and H. White, Universal Ap- 
proximation of Unknown Mapping and its Derivatives Using Multilayer Feedforward Networks, Neural Net- 
works, Vol. 3, 551-560 (1990). 

45 Since there is in fact such a cause and effect relation between the operations of the inspection of the oxide 

thickness and adjusting the etch time for the next etch, neural networks could play a role in the integrated 
circuit fabrication process. By identifying the process signature that reflects the relation between oxide thick- 
ness and overetch time, information about the process signature can be used to train the neural network and 
to make predictions about the etch time adjustment as demonstrated by the description of the preferred em- 

50 bodiment in Section II above. The neural network controller as described produced a standard deviation of 
oxide thickness comparable to human assisted methods. Thus, the neural network controller offer advantages 
on terms of greater uniformity of product quality resulting in higher yields and lower costs in the manufacturing 
process. 

55 

Claims 

1. A method for using a neural network to control a set of one or more variables in a process to yield a product 
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characterized by a set of one or more quality attributes, said method comprising the steps of: 

a. identifying a set of one or more process signatures, wherein said set of one or more process signa- 
tures reflects a relation between said set of one or more quality attributes and said set of one or more 
variables; 

5 b. measuring said plurality process signatures to form a record; 

c. providing data from said record to said neural network; 

d. using said neural network to control said set of one or more variables based on said data. 

2. The method of claim 1 wherein said neural network is trained with measurements of said set of one or 
10 more quality attributes and with prior data of said process signature. 

3. The method of claim 1 wherein said neural network is trained with the back-propagation technique. 

4. The method of claim 1 further comprising the step of using said neural network to control said set of one 
or more variables based on information from a database. 



15 



5. A method for controlling a set of one or more process variables in a plasma etching process to yield a 
product characterized by a set of one or more quality attributes, said method comprising the steps of: 

a. identifying a set of one or more process signatures reflecting a relation between said set of one or 
more process variables and said set of one or more quality attributes; 
20 t>. measuring said set of one or more process signatures to form a record; 

c. providing data from said record to a neural network; 

d. training said neural network with prior data from records of said plurality of process signatures and 
with prior data of said quality attributes; 

e. controlling said set of one or more process variables based on said training and on said data from 
25 said record. 

6. The method of claim 5 wherein said set of one or more process variables is etch time. 

7. The method of claim 5 wherein said etching process selectively removes a first layer of material to expose 
30 an underlying second layer of material, said underlying second layer of material having a thickness as- 
sociated with it, and wherein said quality attribute is said thickness associated with said second layer upon 
completion of said etching process. 

8. The method of claim 5 wherein said set of one or more process signatures is a plasma emission. 

35 9. The method of claim 5 wherein data is gathered during said measuring of said set of one or more process 
signatures after said set of one or more process signatures is normalized. 

10. The method of claim 5 wherein said, neural network is trained with the back-propagation technique. 

40 11. The method of claim 5 further comprising the step of controlling said set of one or more process variables 
based on information from a database. 

12. A method for controlling etch time in a plasma etching process, wherein said etching process selectively 
removes a first layer of material to expose an underlying second layer of material, said underlying second 
45 layer of material having a thickness associated with it, said method comprising the steps of: 

a. measuring a plasma emission from said plasma etching process to form a record, 

b. providing data from said record to a neural network, 

c. training said neural network, wherein said training of said neural network comprises computing a 
first order linear correction to control said etch time according to the rule 

50 . . (T obs - T des ) 

where t ideal is the ideal etch time for a desired thickness of said underlying second layer, to*, is the actual 
etch time, T obs is the observed thickness of said underlying second layer of material at the end of said 
plasma etching process, T de3 is the desired thickness of said underlying second layer of material at the 
55 end of said plasma etching process, and E R is a constant representing the etch rate of said plasma 

etching process, 

d. controlling said etch time based on said training and on data from said record. 



8 



EP 0 602 855 A1 




MATERIALS 
AND CONTROL 
VARIABLES 



FIG. 2 



PROCESS 



204 



NEURAL NETWORK . 
PROCESS MONITOR 



^202 



OUTPUT 
PROOUCT 



4000 



3000 



INTENSITY 



2000 



1000 



FIG 









+ 
+ 
♦ 




+ 

♦ 






+ 
+ 

1 1 i i i Jta-Hri 





50 



100 
TIME 



150 



EP 0 602 855 A1 



FIG. 4 



OPTICAL 
EMISSION 
TRACE 



REAL 
WORLD 
OBSERVATIONS 
FROM 
PLASMA 
REACTOR 



OBSERVED 
ETCH TIME 



NEURAL 
NETWORK 
CONTROLLER 



-402 



ERROR 
SIGNAL 



GUESSED 
ETCH TIME 



OBSERVED 
OXIDE THICKNESS 



ETCH RATE 

DESIRED 
THICKNESS 



FIRST 
ORDER 
LINEAR 
CORRECTION 
PROCESS 



IDEAL 
ETCH TIME 



k 405 




FIG. 7 



MATERIALS 
& CONTROL 
VARIABLES 





PROCESS 


, OUTPUT 






PRODUCT 



NEURAL NETWORK 
INPUT MONITOR 



10 



EP 0 602 855 A1 




11 




EP 0 602 855 A1 



FIG. 9 



DESIRED 
RESPONSE 



INPUT 



NEURAL 


A 


NETWORK 





ERROR 



FIG. 10 




12 



EP 0 602 855 A1 



FIG. 11 



MATERIALS 
AND CONTROL 
VARIABLES 



PROCESS 



NEURAL NETWORK . 
PROCESS MONITOR 



DATABASE 



OUTPUT 
PRODUCT 



204 



*202 



'1106 



FIG. 12 



REAL 
WORLD 
OBSERVATIONS 
FROM 
PLASMA 
REACTOR 

AND 
DATABASE 
INFORMATION 



OPTICAL EMISSION TRACE 



OTHER PROCESS SIGNATURES 



HISTORICAL OBSERVATIONS 



OBSERVED ETCH TIME 



OBSERVED 
OXIDE THICKNESS 



ETCH RATE 

DESIRED 
THICKNESS 



W02 



NEURAL 
NETWORK 
CONTROLLER 



ERROR 
SIGNAL 



GUESSED 
ETCH TIME 



FIRST 
ORDER 
LINEAR 
CORRECTION 
PROCESS 



IDEAL 
ETCH TIME 



M05 



13 



EP 0 602 855 A1 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



Application Number 

EP 93 30 9691 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



X 
Y 



Citation of document with indication, where appropriate, 
of relevant passages 



EP-A-0 443 249 (AMERICAN TELEPHONE AND 
TELEGRAPH COMPANY) 

line 40 - line 48 * 
line 5 - line 41 * 
line 10 - line 14; claim 1; 



* page 2, 

* page 3, 

* page 5, 
figure 4 1 



Relevant 
to claim 



IEEE /SEMI INTERNATIONAL SEMICONDUCTOR 
MANUFACTURING SCIENCE SYMPOSIUM, SAN 
FRANCISCO, CA, USA, 15-16 JUNE 1992 NEW 
YORK US 

pages 124 - 129 

CD. HIMMEL ET AL. 'A comparison of 
statistically-based and neural network 
models of plasma etch behavior.' 

* page 126, left column, paragraph 2 - 
page 127, left column, paragraph 2 * 

IEEE INTERNATIONAL CONFERENCE ON FUZZY 
SYSTEMS, 8-12 MARCH 1992, SAN DIEGO, CA, 
US 

pages 101 - 108 

SU-SHING CHEN 'Intelligent control of 
semi-conductor manufacturing processes. 1 

* page 106, paragraph 3 - page 108, 
paragraph 4 * 

JOURNAL OF THE ELECTROCHEMICAL SOCIETY 
vol. 139, no. 3 , 3 March 1992 , 
MANCHESTER, NEW HAMPSHIRE US 
pages 907 - 914 

R. SHADMER ET AL. 'Principal component 
analysis of optical emission spectroscopy 
and mass spectrometry; application to 
reactive ion etch process parameter 
estimation using neural networks.' 

* page 912, right column, paragraph 4 - 
page 914, right column, paragraph 2 * 

-/— 



1,2,4 
5,11 

1-4,12 



CLASSIFICATION OF THE 
APPLICATION (IM-CLS) 



H01J37/32 
G05B13/02 



5,11 



1-5,8, 
10,11 



The present search report has been drawn up for all claims 



1-5,8, 
10,11 



TECHNICAL FIELDS 
SEARCHED (InlU.S) 



H01J 

G05B 



Piacm mf sevefc 

THE HAGUE 



D*e •( caw&akm mi the lev* 

14 February 1994 



Schaub, G 



CATEGORY OF CITED DOCUMENTS 

X : particularly relevant if taken alone 

y : particularly relevant if combined with another 

document of the sane category 
A : technological background 
O : non-written ■isctosare 
P : intermediate document 



T : theory or principle underlying the invention 
E : earlier patent document, but published on, or 

after the filing date 
D : document cited in the application 
L>: document dted for other reasons 

A : member of the same patent family, corresponding 
document 



14 



EP 0 602 855 A1 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



Application Number 

EP 93 30 9691 



DOCUMENTS CONSIDERED TO BE RELEVANT 



j Category 



Citation of document with indication, where appropriate, 
of relevant passages 



Relevant 
to claim 



CLASSIFICATION OF THE 
APPLICATION (IntCL5) 



!p,x 



JOURNAL OF VACUUM SCIENCE AND TECHNOLOGY: 
PART B 

vol. li, no. 4 , August 1993 , NEW YORK US 
pages 1314 - 1316 

E.A. RIETMAN ET AL. 'Active neural network 
control of wafer attributes in a plasma 
etch process. 1 
* the whole document * 



1-12 



TECHNICAL FIELDS 
SEARCHED (Int.U.5) 



The present search report has been drawn up for all claims 



THE HAGUE 



DMe of conpletkui of tte ietrck 

14 February 1994 



Schaub, G 



| 

O 
O 



CATEGORY OF CITED DOCUMENTS 

X : particularly relevant if taken alone 

Y : particularly relevant if combined with another 

document of the same category 
A : technological background 
O : non-written disclosure 
P : intermediate document 



T : theory or principle underlying the invention 
E : earlier patent document, but published on, or 

after the filing date 
D : document cited in the application 
L : document cited for other reasons 

A : member of the same patent family, corresponding 
document 



15 



THIS PAGE BLANK (MSPTO) 



